28 research outputs found

    Multi-Stage Reinforcement Learning for Non-Prehensile Manipulation

    Full text link
    Manipulating objects without grasping them enables more complex tasks, known as non-prehensile manipulation. Most previous methods only learn one manipulation skill, such as reach or push, and cannot achieve flexible object manipulation.In this work, we introduce MRLM, a Multi-stage Reinforcement Learning approach for non-prehensile Manipulation of objects.MRLM divides the task into multiple stages according to the switching of object poses and contact points.At each stage, the policy takes the point cloud-based state-goal fusion representation as input, and proposes a spatially-continuous action that including the motion of the parallel gripper pose and opening width.To fully unlock the potential of MRLM, we propose a set of technical contributions including the state-goal fusion representation, spatially-reachable distance metric, and automatic buffer compaction.We evaluate MRLM on an Occluded Grasping task which aims to grasp the object in configurations that are initially occluded.Compared with the baselines, the proposed technical contributions improve the success rate by at least 40\% and maximum 100\%, and avoids falling into local optimum.Our method demonstrates strong generalization to unseen object with shapes outside the training distribution.Moreover, MRLM can be transferred to real world with zero-shot transfer, achieving a 95\% success rate.Code and videos can be found at https://sites.google.com/view/mrlm

    Modelling arterial pressure waveforms using Gaussian functions and two-stage particle swarm optimizer

    Get PDF
    Changes of arterial pressure waveform characteristics have been accepted as risk indicators of cardiovascular diseases. Waveform modelling using Gaussian functions has been used to decompose arterial pressure pulses into different numbers of subwaves and hence quantify waveform characteristics. However, the fitting accuracy and computation efficiency of current modelling approaches need to be improved. This study aimed to develop a novel two-stage particle swarm optimizer (TSPSO) to determine optimal parameters of Gaussian functions. The evaluation was performed on carotid and radial artery pressure waveforms (CAPW and RAPW) which were simultaneously recorded from twenty normal volunteers. The fitting accuracy and calculation efficiency of our TSPSO were compared with three published optimization methods: the Nelder-Mead, the modified PSO (MPSO), and the dynamic multiswarm particle swarm optimizer (DMS-PSO). The results showed that TSPSO achieved the best fitting accuracy with a mean absolute error (MAE) of 1.1% for CAPW and 1.0% for RAPW, in comparison with 4.2% and 4.1% for Nelder-Mead, 2.0% and 1.9% for MPSO, and 1.2% and 1.1% for DMS-PSO. In addition, to achieve target MAE of 2.0%, the computation time of TSPSO was only 1.5 s, which was only 20% and 30% of that for MPSO and DMS-PSO, respectively

    On-Policy Pixel-Level Grasping Across the Gap Between Simulation and Reality

    Full text link
    Grasp detection in cluttered scenes is a very challenging task for robots. Generating synthetic grasping data is a popular way to train and test grasp methods, as is Dex-net and GraspNet; yet, these methods generate training grasps on 3D synthetic object models, but evaluate at images or point clouds with different distributions, which reduces performance on real scenes due to sparse grasp labels and covariate shift. To solve existing problems, we propose a novel on-policy grasp detection method, which can train and test on the same distribution with dense pixel-level grasp labels generated on RGB-D images. A Parallel-Depth Grasp Generation (PDG-Generation) method is proposed to generate a parallel depth image through a new imaging model of projecting points in parallel; then this method generates multiple candidate grasps for each pixel and obtains robust grasps through flatness detection, force-closure metric and collision detection. Then, a large comprehensive Pixel-Level Grasp Pose Dataset (PLGP-Dataset) is constructed and released; distinguished with previous datasets with off-policy data and sparse grasp samples, this dataset is the first pixel-level grasp dataset, with the on-policy distribution where grasps are generated based on depth images. Lastly, we build and test a series of pixel-level grasp detection networks with a data augmentation process for imbalance training, which learn grasp poses in a decoupled manner on the input RGB-D images. Extensive experiments show that our on-policy grasp method can largely overcome the gap between simulation and reality, and achieves the state-of-the-art performance. Code and data are provided at https://github.com/liuchunsense/PLGP-Dataset

    Hybrid Cascade Structure for License Plate Detection in Large Visual Surveillance Scenes

    No full text

    Supplemental Boosting and Cascaded ConvNet Based Transfer Learning Structure for Fast Traffic Sign Detection in Unknown Application Scenes

    No full text
    With rapid calculation speed and relatively high accuracy, the AdaBoost-based detection framework has been successfully applied in some real applications of machine vision-based intelligent systems. The main shortcoming of the AdaBoost-based detection framework is that the off-line trained detector cannot be transfer retrained to adapt to unknown application scenes. In this paper, a new transfer learning structure based on two novel methods of supplemental boosting and cascaded ConvNet is proposed to address this shortcoming. The supplemental boosting method is proposed to supplementally retrain an AdaBoost-based detector for the purpose of transferring a detector to adapt to unknown application scenes. The cascaded ConvNet is designed and attached to the end of the AdaBoost-based detector for improving the detection rate and collecting supplemental training samples. With the added supplemental training samples provided by the cascaded ConvNet, the AdaBoost-based detector can be retrained with the supplemental boosting method. The detector combined with the retrained boosted detector and cascaded ConvNet detector can achieve high accuracy and a short detection time. As a representative object detection problem in intelligent transportation systems, the traffic sign detection problem is chosen to show our method. Through experiments with the public datasets from different countries, we show that the proposed framework can quickly detect objects in unknown application scenes

    Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images

    No full text
    With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and partial occlusion. In seeking to address these challenges, we propose a novel Multi-Scale and Occlusion Aware Network (MSOA-Net) for UAV based vehicle segmentation, which consists of two parts including a Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and a Regional Attention based Triple Head Network (RATH-Net). In MSFAF-Net, a self-adaptive feature fusion module is proposed, which can adaptively aggregate hierarchical feature maps from multiple levels to help Feature Pyramid Network (FPN) deal with the scale change of vehicles. The RATH-Net with a self-attention mechanism is proposed to guide the location-sensitive sub-networks to enhance the vehicle of interest and suppress background noise caused by occlusions. In this study, we release a large comprehensive UAV based vehicle segmentation dataset (UVSD), which is the first public dataset for UAV based vehicle detection and segmentation. Experiments are conducted on the challenging UVSD dataset. Experimental results show that the proposed method is efficient in detecting and segmenting vehicles, and outperforms the compared state-of-the-art works

    Multi-resolution attention convolutional neural network for crowd counting

    No full text
    Estimating crowd counts remains a challenging task due to the problems of scale variations, non-uniform distribution and complex backgrounds. In this paper, we propose a multi-resolution attention convolutional neural network (MRA-CNN) to address this challenging task. Except for the counting task, we exploit an additional density-level classification task during training and combine features learned for the two tasks, thus forming multi-scale, multi-contextual features to cope with the scale variation and non-uniform distribution. Besides, we utilize a multi-resolution attention (MRA) model to generate score maps, where head locations are with higher scores to guide the network to focus on head regions and suppress non-head regions regardless of the complex backgrounds. During the generation of score maps, atrous convolution layers are used to expand the receptive field with fewer parameters, thus getting higher-level features and providing the MRA model more comprehensive information. Experiments on ShanghaiTech, WorldExpo’10 and UCF datasets demonstrate the effectiveness of our method.Info-communications Media Development Authority (IMDA)Accepted versionThis work was supported in part by the National Natural Science Foundation of China under Grant 61673244, Grant 61273277 and Grant 61703240) and was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the Infocomm Media Development Authority, Singapore
    corecore